Level-3 Cholesky Kernel Subroutine of a Fully Portable High Performance Minimal Storage Hybrid Format Cholesky Algorithm
نویسندگان
چکیده
The TOMS paper ”A Fully Portable High Performance Minimal Storage Hybrid Format Cholesky Algorithm” by Andersen, Gunnels, Gustavson, Reid, and Waśniewski, used a level 3 Cholesky kernel subroutine instead of level 2 LAPACK routine POTF2. We discuss the merits of this approach and show that its performance over POTRF is considerably improved on a variety of common platforms when POTRF is solely restricted to calling POTF2.
منابع مشابه
High Performance Cholesky Factorization via Blocking and Recursion That Uses Minimal Storage
We present a high performance Cholesky factorization algorithm , called BPC for Blocked Packed Cholesky, which performs better or equivalent to the LAPACK DPOTRF subroutine, but with about the same memory requirements as the LAPACK DPPTRF subroutine, which runs at level 2 BLAS speed. Algorithm BPC only calls DGEMM and level 3 kernel routines. It combines a recursive algorithm with blocking and ...
متن کاملLAPACK Cholesky Routines in Rectangular Full Packed Format
We describe a new data format for storing triangular and symmetric matrices called RFP (Rectangular Full Packed). The standard two dimensional arrays of Fortran and C (also known as full format) that are used to store triangular and symmetric matrices waste half the storage space but provide high performance via the use of level 3 BLAS. Packed format arrays fully utilize storage (array space) b...
متن کاملRectangular Full Packed Format for LAPACK Algorithms Timings on Several Computers
We describe a new data format for storing triangular and symmetric matrices called RFP (Rectangular Full Packed). The standard two dimensional arrays of Fortran and C (also known as full format) that are used to store triangular and symmetric matrices waste nearly half the storage space but provide high performance via the use of level 3 BLAS. Standard packed format arrays fully utilize storage...
متن کاملOptimizing Locality of Reference in Cholesky Algorithms1
This paper presents the principle ideas involved in hierarchical blocking, introduces the block packed storage scheme, and gives the implementation details and the performance rates of the hierarchically blocked Cholesky factorization. In some cases the newly developed routines are faster by an order of magnitude than the corresponding Lapack routines. Introduction Most current computers based ...
متن کاملNew Generalized Data Structures for Matrices Lead to a Variety of High Performance Algorithms
We describe new data structures for full storage of general matrices that generalize the current storage layouts of the Fortran and C programming languages. We also describe new data structures for full and packed storage of dense symmetric/triangular arrays that generalize both full and packed storage. Using the new data structures, one is led to several new algorithms that save “half” the sto...
متن کامل